Identifying Information Units for Multiple Document Summarization
نویسندگان
چکیده
Multiple document summarization is becoming increasingly important as a way of reducing information overload, particularly in the context of the proliferation of similar accounts of events that are available on the Web. Removal of similar sentences often results in either partial or unwanted elimination of important information. In this paper, we present an approach to split sentences into their component clauses and use these clauses to produce comprehensive summaries of multiple documents describing particular events. Detailed analysis of all clauses and clause boundaries may be complex and computationally expensive. Our rule-based approach demonstrates that it is possible to achieve high accuracy in reasonable time.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملTowards Multidocument Summarization by Reformulation: Progress and Prospects
By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is uniqu...
متن کاملDetecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning
We present a new composite similarity metric that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units. Several potential features are investigated and an optireal combination is selected via machine learning. We discuss a more restrictive definition of similarity than traditional, document-level and information retrieval-ori...
متن کاملMr&mr-Sum: Maximum Relevance and Minimum Redundancy Document Summarization Model
We have presented an approach to automatic document summarization. In the proposed approach, text summarization is modeled as a quadratic integer-programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units...
متن کاملImproving the Performance of the Random Walk Model for Answering Complex Questions
We consider the problem of answering complex questions that require inferencing and synthesizing information from multiple documents and can be seen as a kind of topicoriented, informative multi-document summarization. The stochastic, graph-based method for computing the relative importance of textual units (i.e. sentences) is very successful in generic summarization. In this method, a sentence...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005